Search CORE

563 research outputs found

Cross-Lingual Adaptation using Structural Correspondence Learning

Author: Prettenhofer Peter
Stein Benno
Publication venue
Publication date: 25/08/2010
Field of study

Cross-lingual adaptation, a special case of domain adaptation, refers to the transfer of classification knowledge between two languages. In this article we describe an extension of Structural Correspondence Learning (SCL), a recently proposed algorithm for domain adaptation, for cross-lingual adaptation. The proposed method uses unlabeled documents from both languages, along with a word translation oracle, to induce cross-lingual feature correspondences. From these correspondences a cross-lingual representation is created that enables the transfer of classification knowledge from the source to the target language. The main advantages of this approach over other approaches are its resource efficiency and task specificity. We conduct experiments in the area of cross-language topic and sentiment classification involving English as source language and German, French, and Japanese as target languages. The results show a significant improvement of the proposed method over a machine translation baseline, reducing the relative error due to cross-lingual adaptation by an average of 30% (topic classification) and 59% (sentiment classification). We further report on empirical analyses that reveal insights into the use of unlabeled data, the sensitivity with respect to important hyperparameters, and the nature of the induced cross-lingual correspondences

arXiv.org e-Print Archive

CiteSeerX

The Argument Reasoning Comprehension Task: Identification and Reconstruction of Implicit Warrants

Author: Gurevych Iryna
Habernal Ivan
Stein Benno
Wachsmuth Henning
Publication venue
Publication date: 01/01/2018
Field of study

Reasoning is a crucial part of natural language argumentation. To comprehend an argument, one must analyze its warrant, which explains why its claim follows from its premises. As arguments are highly contextualized, warrants are usually presupposed and left implicit. Thus, the comprehension does not only require language understanding and logic skills, but also depends on common sense. In this paper we develop a methodology for reconstructing warrants systematically. We operationalize it in a scalable crowdsourcing process, resulting in a freely licensed dataset with warrants for 2k authentic arguments from news comments. On this basis, we present a new challenging task, the argument reasoning comprehension task. Given an argument with a claim and a premise, the goal is to choose the correct implicit warrant from two options. Both warrants are plausible and lexically close, but lead to contradicting claims. A solution to this task will define a substantial step towards automatic warrant reconstruction. However, experiments with several neural attention and language models reveal that current approaches do not suffice.Comment: Accepted as NAACL 2018 Long Paper; see details on the front pag

arXiv.org e-Print Archive

TUbiblio

Crossref

TIR 2015 Workshop Preface

Author: Granitzer Michael
Seifert Christin
Stein Benno
Publication venue
Publication date: 01/01/2015
Field of study

Presents the introductory welcome message from the conference proceedings. May include the conference officers' congratulations to all involved with the conference event and publication of the proceedings record

University of Twente Research Information

Recommended from our members

Demanded Abstract Interpretation

Author: Stein Benno
Publication venue: University of Colorado Boulder
Publication date: 01/04/2022
Field of study

Formal static analysis is seeing increasingly widespread adoption as a tool for verificationand bug-finding, but even with powerful cloud infrastructure it can take minutes or hours for a developer to get analysis results after a code change. This dissertation considers the problem of making expressive and sophisticated static analyzers interactive by providing analysis results to developers in as close to real time as possible. While existing techniques offer some demand-driven or incremental aspects for certain classes of analysis, the fundamental challenge addressed by this work is doing both for abstract interpretation in arbitrary domains.This dissertation presents a technique, demanded abstract interpretation, that lifts analysiscomputations to a dependency graph structure in which incremental program edits and demand-driven evaluation of abstract semantics can be handled uniformly. Demanded abstract interpretation draws inspiration from graph-based approaches to incremental computation, and is not only sound and terminating but also from-scratch consistent with underlying batch analyses. The approach is parametric in the choice of abstract domain, supporting a wide range of analysis problems and enabling the reuse of highly-tuned existing domain implementations in our demanded analysis framework without requiring any per-domain reasoning about incrementality or demand. The complex, cyclic, and unbounded dependency structures that arise when analyzing loops and recursive control flow in an infinite-height domain are a key challenge, which our approach handles by dynamically extending novel acyclic encodings of such analysis computation.This dissertation describes and formalizes demanded abstract interpretation techniques forboth intraprocedural analysis and compositional interprocedural analysis. We also present promising experimental results in a prototype analysis implementation, and describe some extensions to the framework designed to confront practical resource constraints without sacrificing formal guarantees

CU Scholar Institutional Repository

Retrieval Models for Genre Classification

Author: Eissen Sven Meyer zu
Stein Benno
Publication venue: AIS Electronic Library (AISeL)
Publication date: 01/01/2008
Field of study

Genre provides a characterization of a document with respect to its form or functional trait. Genre is orthogonal to topic, rendering genre information a powerful filter technology for information seekers in digital libraries. However, an efficient means for genre classification is an open and controversially discussed issue. This paper gives an overview and presents new results related to automatic genre classification of text documents. We present a comprehensive survey which contrasts the genre retrieval models that have been developed for Web and non-Web corpora. With the concept of genre-specific core vocabularies the paper provides an original contribution related to computational aspects and classification performance of genre retrieval models: we show how such vocabularies are acquired automatically and introduce new concentration measures that quantify the vocabulary distribution in a sensible way. Based on these findings we construct lightweight genre retrieval models and evaluate their discriminative power and computational efficiency. The presented concepts go beyond the existing utilization of vocabulary-centered, genre-revealing features and open new possibilities for the construction of genre classifiers that operate in real-time

CiteSeerX

AIS Electronic Library (AISeL)

A keyquery-based classification system for CORE

Author: Gollub Tim
Hagen Matthias
Stein Benno
Völske Michael
Publication venue
Publication date: 26/04/2017
Field of study

We apply keyquery-based taxonomy composition to compute a classification system for the CORE dataset, a shared crawl of about 850,000 scientific papers. Keyquery-based taxonomy composition can be understood as a two-phase hierarchical document clustering technique that utilizes search queries as cluster labels: In a first phase, the document collection is indexed by a reference search engine, and the documents are tagged with the search queries they are relevant—for their so-called keyqueries. In a second phase, a hierarchical clustering is formed from the keyqueries within an iterative process. We use the explicit topic model ESA as document retrieval model in order to index the CORE dataset in the reference search engine. Under the ESA retrieval model, documents are represented as vectors of similarities to Wikipedia articles; a methodology proven to be advantageous for text categorization tasks. Our paper presents the generated taxonomy and reports on quantitative properties such as document coverage and processing requirements

Online-Publikationssystem der Bauhaus-Universität Weimar

Digitale Bibliothek Thüringen